A Method of Automated Nonparametric Content Analysis for Social Science
نویسندگان
چکیده
The increasing availability of digitized text presents enormous opportunities for social scientists. Yet hand coding many blogs, speeches, government records, newspapers, or other sources of unstructured text is infeasible. Although computer scientists have methods for automated content analysis, most are optimized to classify individual documents, whereas social scientists instead want generalizations about the population of documents, such as the proportion in a given category. Unfortunately, even a method with a high percent of individual documents correctly classified can be hugely biased when estimating category proportions. By directly optimizing for this social science goal, we develop a method that gives approximately unbiased estimates of category proportions even when the optimal classifier performs poorly. We illustrate with diverse data sets, including the daily expressed opinions of thousands of people about the U.S. presidency. We also make available software that implements our methods and large corpora of text for further analysis.
منابع مشابه
A Novel Method for Automated Estimation of Effective Parameters of Complex Auditory Brainstem Response: Adaptive Processing based on Correntropy Concept
Objectives: Automated Auditory Brainstem Responses (ABR) peak detection is a novel technique to facilitate the measurement of neural synchrony along the auditory pathway through the brainstem. Analyzing the location of the peaks in these signals and the time interval between them may be utilized either for analyzing the hearing process or detecting peripheral and central lesions in the human he...
متن کاملNovel Automated Method for Minirhizotron Image Analysis: Root Detection using Curvelet Transform
In this article a new method is introduced for distinguishing roots and background based on their digital curvelet transform in minirhizotron images. In the proposed method, the nonlinear mapping is applied on sub-band curvelet components followed by boundary detection using energy optimization concept. The curvelet transform has the excellent capability in detecting roots with different orient...
متن کاملAnalysis of Users’ Opinions about Reasons for Divorce
One of the most important issues related to knowledge discovery is the field of comment mining. Opinion mining is a tool through which the opinions of people who comment about a specific issue can be evaluated in order to achieve some interesting results. This is a subset of data mining. Opinion mining can be improved using the data mining algorithms. One of the important parts of opinion minin...
متن کاملGender-based Differences in Associations between Attitude and Self-esteem with Smoking Behavior among Adolescents: A Secondary Analysis Applying Bayesian Nonparametric Functional Latent Variable Model
Background: Different patterns of gender-based relationships between attitude toward smoking and self-esteem with smoking behavior have reported. However, such associations may be much more complex than a simply supposed linear relationship. We aimed to propose a method of providing hand details on the total and gender-based scenarios of the relationships between attitude toward smoking and sel...
متن کاملIncorporating nonparametric statistics into Delphi studies in library and information science
Introduction. The Delphi technique is widely used in library and information science research. However, many researchers in the field fail to employ standard statistical tests when using this technique. This makes the technique vulnerable to criticisms of its reliability and validity. The general goal of this article is to explore how nonparametric statistical techniques could mitigate this dra...
متن کامل